perm filename MISC[HPP,DBL]2 blob sn#197249 filedate 1976-01-18 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00011 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00003 00002	Step back for a moment and look over what's been done.
C00006 00003	This is an  excerpt from a session with AM, as it first discovers Cardinality:
C00009 00004	Well, now that you've seen what AM says, let's see how it does it.
C00015 00005	Let's pause and take stock of what assumptions all this depends on.
C00017 00006	Let me draw an analogy between this process and that of writing proposals
C00022 00007	This is analogous to a 4proof*, where each arc is an instance of an
C00026 00008	I'm claiming that theory formation can be considered a heuristic search.
C00031 00009	I chose to have my system try to do research in  mathematics,
C00033 00010	AM CONJECTURE
C00036 00011	TIME TOPIC
C00039 ENDMK
C⊗;
Step back for a moment and look over what's been done.
Notice that this whole discussion is not really specific to doing math research.
The design for the concept-grower can be used to implement
almost any kind of theory formation system.
The only change from one program to another would be the particular
starting concepts, the particular facets each concept could have, and
of course the individual heuristics would change. So the design of the
system could be the same. Perhaps if several such systems were created,
they could be run together, and cooperate.

But how should we choose the particular scientific field for the theory formation?
Why was mathematics chosen?
Well, for one thing, mathematicians 
don't have to worry about uncertainties in their data, as
one would from, eg., a mass spectrometer. A personal reason was that
I wouldn't have to elicit the heuristics from experts outside the AI field; I
could use introspection. So that problem is eliminated. But the biggest
distinction is the idea that each ⊗4natural⊗* science tries to explain observed
data; mathematics is the only science not limited to explaining nature.
Ed Feigenbaum has told me that that was precisely the reason why he and
Josh Lederberg decided to work in chemistry rather than graph theory a decade
ago. 
So the facts lend themselves to very different measures of interpretation.
"Choosing a good domain" is one of the topics we'll discuss on Thursday.


Well, in any event, that was how AM was designed and created. You now know
all the essentials of its representations and algorithms.
What I'd like to do in the remaining time is go over a brief example, to
help you get a feeling for
how the current version of AM does its stuff. Tomorrow, if you want,
you can work with AM during the morning demo period.

This is an  excerpt from a session with AM, as it first discovers Cardinality:

<Cardinality slide>

I'll read through it once to show you that AM really performs magic, then
we'll peek behind the curtains to see that AM is really no more than the
straight-forward data-structure expander that I've been describing.

(1) After testing random structures to see if they are equal, AM concludes that
it would be worthwhile to generalize the predicate EQUAL.
The reason is because so few of these random trials were successful.

(2) Two generalizations of EQUAL are constructed. The first is the
relationship "has the same length as", and the second is "has the same
first element as".

(3) After testing random structures to see if they have the SAME-LENGTH,
AM decides to try to find a canonical form for all structures. This would
mean that two structures were of the same length if and only if their
canonical forms were identically equal.

(4) AM comes up with the idea of making this conversion by (i) converting
eac structure to a BAG, and (ii) replacing each element by the constant T.
Thus the canonical form of
all structures of length 10 is a
BAG of 10 T's.
AM decides to restrict various operations like Union and Intersection
to such canonical structures, and see what happens.

The user tells AM that such canonical structures should be called NUMBERS.



Well, now that you've seen what AM says, let's see how it does it.

To start off slow, let's examine the ways that AM found examples of
things which were EQUAL. The central task was to fill in the
EXAMPLES facet of the EQUAL concept. To do that, AM gathered
heuristics and procedures from various places, such as the
FILLIN facet of the EXAMPLES concept. One of these suggestions
was to instantiate the definition. Here is the recursive definition
of EQUAL: <defn of EQUAL slide>.
An easy way to satisfy this would be simply to satisfy this base step.
To do that, the two arguments must be identical atoms. So some examples
like (T,T), (NIL,NIL) are produced.
A more sophisticated way is to build up two list structures, whose CARS
and CDRS are objects known already to be EQUAL. So if the CARS and CDRS are
both made to be NIL, we have the example ((NIL),(NIL)). A few examples are
constructed in this way. An entirely different approach is suggested by
the  ACTIVITY concept. It says that, when filling in examples of any activity
or predicate, we can randomly instantiate the arguments, run the concept,
and observe the results. So AM randomly picks pairs of objects and
sees if they satisfy EQUAL. This is where the empirical data comes from that
tells AM how rarely EQUAL is satisfied.
Ther are a couple other ways of deriving examples, but let's move on.

The first real magic trick is turning equality into equal-length.
To see how AM did that, we must look at the recursive definition
of EQUAL.
<Defn of EQUAL slide>.
Two objects are equal if they are identical atoms, or if they are both
lists and their cars are equal and their cdrs are equal.
This is the recursive definition that AM wants to generalize.
Notice that it is composed of a base step <point> and two
recursive calls which must both succeed.
In such a case, an easy way to generalize is to assume that one of them
will always succeed. That is, force the predicate to recurse successfully
in only one direction, either the CARs or the CDRs, but not both.
In particular, by doing away with the test in the CAR direction
<defn OVERLAY slide>, we get a predicate which counts down two lists,
and returns T if and only if they both become empty at the same time.
That is, iff they have the same length.

The second magic trick was deriving the canonical form of a structure.
How did AM know it should be a BAG whose  elements are all T's?
AM made this inference by experimentation. First, it found some examples
of things which did and didn't satisfy EQ1, which it was now supposed
to call SAME-LENGTH. Experiment number 1 was to convert the arguments
from one type of structure ot another, and see if that changed their value
under the predicate SAME-LENGTH. It turned out not to, which meant that
one single canonical type of structure could be chosen. But there are
4 kinds it could be: a set, a list, a bag, or an ordered set.
The particular kind depends on whether altering the order of the
elements in the arguments makes any difference to  SAME-LENGTH,
and a similar question about adding multiple occurrences of the same
element. Since changing the order makes no difference, the canonical
structure type must be unordered, either BAG or SET. Since adding multiple
elements does make a difference, it must be of type BAG or LIST.
So the canonical type is BAG.

Next, AM notices an explicit message that SAME-LENGTH does not recurse in
the CAR direction. That is enough evidence to warrant testing whether
or not the value cells of any of these elements are ever accessed.
They aren't, so AM replaces them all by a single, fixed value: T.

Putting this all together, we see the transformation as: convert the
structure to a BAG, and substitute T for each element.

Let's pause and take stock of what assumptions all this depends on.
First, notice that we view mathematics as an empirical science, where
discoveries are made by experimenting with already-known concepts, and
noticing new relationships. These are the conjectures which later may become
theorems. But often they can't be phrased concisely with the existing
terminology. This is a clue that some new definitions should be made.
Or perhaps some concept is too rigid, too narrow to permit some conjecture
to hold. Then we can modify it, generalize or relax it in some way.
The space of concepts and possible operations to them is too immense to
even think of walking around in it unguided. The heuristic rules
transform this so that the only legal moves are those which are
locally plausible, those which can be justified by some general heuristic.
But even this isn't enough; some meta-heuristics are needed to decide
which of all the legal heuristics to apply in a given situation, and how.

Let me draw an analogy between this process and that of writing proposals
to ARPA.
Both of these activities can be viewed as growing a tree.
Each node here represents a funding proposal, each node here a discovery.
Each arc here is labelled with ARPA's comments to the preceding proposal,
which were taken into account in writing this one.
Each arc here is labelled with a heuristic rule which was applied to the preceding
concept discovered.
The top nodes
there are the primitive axioms and known theorems; the top nodes here are the
concepts which we assume to be primitive and already known.
The goal there is a given statement, here a given discovery.

One can take this view of a proof, and use it to write a simple proof
generator; namely, we take some axioms, and repeatedly apply some
rule of inference. This grows a tree of true, provable statements.
The best theorem-provers today use this paradigm, but use some heuristics
to choose which rules to apply in each situation.

Maybe we can do the same kind of thing here, to write a simple discovery-generator.
For theorem-proving, we first have to specify precisely what the primitive
statements are, and all the rules of inference. For discovering, we must specify
the initial core of concepts, and list all the heurisitcs that we can legally
use. Then we repeatedly select some heuristic, and try to apply it. Eventually,
for example, this scheme would generate this path, and uncover prime numbers.

Well, that's a pretty awful scheme, almost like blind search. What do theorem-provers
do to improve the quality of their proofs? They add heuristic domain-specific
information, to help select a meaningful rule, to apply the best rule to the
best statement at the right time. For us, that means knowing which heuristic
rule to apply, when, where, and why.

Another powerful scheme theorem-provers use is that of working out a plan in
a higher-level space, using abstract simplifications of the axioms and the
operators. 
Closely related to this is the use of analogic models (e.g., diagrams in
geometry).
I'll show in a minute that what this idea corresponds (for automating
discovery) to the concept you and I typically call our ⊗4intuition⊗*.



Why choose Mathematics as a domain for such "theory formation"?
How one can analyze a given discovery, "explain" it.
How to invert this process and ⊗4generate⊗* new discoveries
Representing math knowledge as cooperating experts: BEINGS.
What aspects of theory formation are being ignored? emphasized?
Some successes and some problems that ⊗4you⊗* might encounter.

This is analogous to a ⊗4proof⊗*, where each arc is an instance of an
rule of inference, and each node is a statement. The top nodes
there are the primitive axioms and known theorems; the top nodes here are the
concepts which we assume to be primitive and already known.
The goal there is a given statement, here a given discovery.

One can take this view of a proof, and use it to write a simple proof
generator; namely, we take some axioms, and repeatedly apply some
rule of inference. This grows a tree of true, provable statements.
The best theorem-provers today use this paradigm, but use some heuristics
to choose which rules to apply in each situation.

Maybe we can do the same kind of thing here, to write a simple discovery-generator.
For theorem-proving, we first have to specify precisely what the primitive
statements are, and all the rules of inference. For discovering, we must specify
the initial core of concepts, and list all the heurisitcs that we can legally
use. Then we repeatedly select some heuristic, and try to apply it. Eventually,
for example, this scheme would generate this path, and uncover prime numbers.

Well, that's a pretty awful scheme, almost like blind search. What do theorem-provers
do to improve the quality of their proofs? They add heuristic domain-specific
information, to help select a meaningful rule, to apply the best rule to the
best statement at the right time. For us, that means knowing which heuristic
rule to apply, when, where, and why.

Another powerful scheme theorem-provers use is that of working out a plan in
a higher-level space, using abstract simplifications of the axioms and the
operators. 
Closely related to this is the use of analogic models (e.g., diagrams in
geometry).
I'll show in a minute that what this idea corresponds (for automating
discovery) to the concept you and I typically call our ⊗4intuition⊗*.



Why choose Mathematics as a domain for such "theory formation"?
How one can analyze a given discovery, "explain" it.
How to invert this process and ⊗4generate⊗* new discoveries
Representing math knowledge as cooperating experts: BEINGS.
What aspects of theory formation are being ignored? emphasized?
Some successes and some problems that ⊗4you⊗* might encounter.

I'm claiming that theory formation can be considered a heuristic search.
That is, a continual expansion of a tree of concepts and relationships,
guided by some judgmental criteria, some heuristic rules of thumb.

But Artificial Intelligence people have been saying for years that
problem-solving is just heuristic search, that theorem-proving is just heuristic
search, etc.  Is this the same kind of process as everone else always means?

Well, let's look at Chess. A typical heuristic search program would
maintain a tree of board positions, emanating from the current board
situation. The operators are the legal moves, and the search is guided
by (1) heuristic knowledge of what constitutes a plausible move, and
(2) at each state, some  static evaluation function estimating the
value of that situation. So we put a value on each node, and expand only
the most promising nodes as we search further and further into the future.
From a good node, we don't consider all legal moves, but only those which
our heuristics tell us are plausible.
After a certain length of time, we stop and back-up our values to this
level, and use them to decide which move to make.
A typical operator is "moving a king one square in any direction".
A typical evaluation function would consider the amount of material each side
possesses, the number of men under attack, etc. A typical heuristic might
say to try to fork the enemy king and queen with a knight.

This is chess problem-solving. What would it mean to do theory-formation
in chess? We would be searching a space of concepts now, deriving notions
like pin, fork, balance, etc. Perhaps some new notion would prove to be
valuable -- maybe the moments of the pawns about the main diagonals.
The operators are now the heuristic rules which allow us to grow a new
concept, and there are some meta-level heuristics floating around to
help guide this search. There is no simple static evaluation function;
each concept must be tried and tested over long periods of time,
in several games. We can only decide in retrospect whether a new
concept will be worth bearing in mind as we play the game.
A typical operator is "consider the simultaneous creation of 2 existing 
favorable situations". This might lead to the concept of a fork, and
later to the concept of creating a fork and a discovered check, etc.
The evaluation function would have to be of the form "provisionally
allow any new concept to exist, and quickly try to find 1 interesting use
for it". A typical heuristic might say "if you never seem to make any very
dramatic moves, then consider applying some operator which creates a concept
which is rare but has high payoff". For example, this would point to
operators which can create new concepts involving coincidences (like the
discovered pin) or sacrifices of important pieces.

I chose to have my system try to do research in  mathematics,
partly because I have some experience in that activity, and could
therefore extract the needed heuristics by introspection.
Another advantage of mathematics over other natural sciences is the
absence of experimental apparatus; one needn't go to a lab to perform
any experiments that need doing, one can examine any items hew has theoretically.
There is no errorful data to contend with. Finally, mathematics is quite
attractive as a concept-growing domain because it is not limited to
explaining a given natural phenomenon. Of all the sciences, mathematics is
the one where this tree is most likely to bear fruit no matter which path we
take. In chemistry or physics, for example, one would not expect many alternative
formulations of the fundamental concepts we call mass, force, and bond saturation.

AM CONJECTURE

AM has helped inspire a characterization of those natural numbers
which have an abnormally large number of divisors.
We shall call a number n Maximally-divisible iff all smaller numbers have
fewer divisors.
The function d(n) is the number of divisors of the natural number n.
Then the AM Conjecture states that
all numbers of the following form are maximally divisible:
n=2↑a↑⊗71⊗*3↑a↑⊗72⊗*5↑a↑⊗73⊗*...P↓k↑a↑⊗7k⊗*, for some k, and
for all i and j < k, (a↓i + 1)/(a↓j + 1) is "close" to log(P↓j)/log(P↓i),
where "close" means 
"as closes as possible for integers". More technically, it means
that adding +-1 to any exponent, to any of the a↓i, does not make any
ratio closer to its ideal quotient of 2 logs.
For example, such a number is
n=2⊗A8⊗*3⊗A5⊗*5⊗A3⊗*7⊗A2⊗*11⊗A2⊗*13⊗A1⊗*17⊗A1⊗*19⊗A1⊗*23⊗A1⊗*29⊗A1⊗*31⊗A1⊗*37⊗A1⊗*41⊗A1⊗*43⊗A1⊗*47⊗A1⊗*53⊗A1⊗*.
The progression of its exponents+1 (9 6 4 3 3 2 2 2 2 2 2 2 2 2 2 2)
is  about as  close  as one  can  get  to satisfying the  "logarithm"
constraint.   
By that I mean that 9/6 is close to log(3)/log(2); that 2/2 is close to
log(47)/log(43), etc.  Changing any exponent by plus or minus 1 would make
those ratios worse than they are.
This  number n is about 25 billion, and has about 4 million divisors.
The average number inthe neighborhood of n has about 
2↑[loglog n] divisors (about 9, for this n).

This is so far the only piece of interesting mathematics, previously unknown, 
that was directly motivated by AM. 

TIME TOPIC

10	Review: AM's task, basic strategy, organization, BEINGs representation
15	Some general issues: 
		numerical weights
		"priming" the system toward some goal
		acquiring new heuristics
		affect of data structure formats and other built-in 
		     	representations on what the system can acheive,
		the name of the system: meaningless vs presumptous
		characteristics of a task which make it ripe for automation
		discovering vs being led by the nose
		interactions with a user/co-researcher
10	Some specific issues:
		Concepts AM starts with: what, why
		Route to equal-length
		Need for canonical representations
		Proposal of prime numbers
		Converse proposal: divisor-rich numbers
		Why AM can propose the Unique Factorization Theorem,
			for primes, but ⊗4not⊗* the Ratio-of-logs 
			conjecture for divisor-rich numbers.
		Intuition: its role in math research and in AM
25	Discussion
		Philosophical justifications and implications of AM
		Fututre priorities: intuitions, expanded breadth of
			discovery, expanded depth, better user interface,
			experiments, documentation, public relations,
			theoretical analysis and extraction of key ideas
		AM next year at Stanford: will someone "carry on"?
			More dissertation topics within this project?